{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 使用 TGI 的消息 API 从 OpenAI 迁移到 Open LLMs\n",
    "\n",
    "_作者: [Andrew Reed](https://huggingface.co/andrewrreed)_\n",
    "\n",
    "这个 notebook 展示了如何轻松地从 OpenAI 模型过渡到 Open LLMs，而无需重构任何现有代码。\n",
    "\n",
    "[文本生成推理（TGI）](https://github.com/huggingface/text-generation-inference)现在提供了一个[消息 API](https://huggingface.co/blog/tgi-messages-api)，使其与 OpenAI 的聊天完成 API 的直接兼容。这意味着任何使用 OpenAI 的模型（通过 OpenAI 客户端库或像 LangChain 或 LlamaIndex 这样的第三方工具）的现有脚本都可以直接替换为使用运行在 TGI 端点上的任何开源 LLM！\n",
    "\n",
    "这允许你快速测试并受益于开源模型提供的众多优势。例如：\n",
    "\n",
    "- 对模型和数据的完全控制和透明度\n",
    "\n",
    "- 不再担心速率限制\n",
    "\n",
    "- 能够根据你的具体需求完全定制系统\n",
    "\n",
    "在这个 notebook 中，我们将向你展示具体流程：\n",
    "\n",
    "1. [使用 TGI 创建推理端点来部署模型](#section_1)\n",
    "2. [使用 OpenAI 客户端库查询推理端点](#section_2)\n",
    "3. [将端点与 LangChain 和 LlamaIndex 工作流程集成](#section_3)\n",
    "\n",
    "**让我们开始吧！**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 初始化设置\n",
    "\n",
    "首先，我们需要安装依赖项和设置一个 HF API 密钥。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --upgrade -q huggingface_hub langchain langchain-community langchainhub langchain-openai llama-index chromadb bs4 sentence_transformers torch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import getpass\n",
    "\n",
    "# enter API key\n",
    "os.environ[\"HUGGINGFACEHUB_API_TOKEN\"] = HF_API_KEY = getpass.getpass()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"section_1\"></a>\n",
    "\n",
    "## 1. 创建一个推理端点\n",
    "\n",
    "一开始，让我们使用 TGI 将[Nous-Hermes-2-Mixtral-8x7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO)，一个微调的 Mixtral 模型，部署到推理端点。\n",
    "\n",
    "我们只需通过 UI 的[几次点击](https://ui.endpoints.huggingface.co/new?vendor=aws&repository=NousResearch%2FNous-Hermes-2-Mixtral-8x7B-DPO&tgi_max_total_tokens=32000&tgi=true&tgi_max_input_length=1024&task=text-generation&instance_size=2xlarge&tgi_max_batch_prefill_tokens=2048&tgi_max_batch_total_tokens=1024000&no_suggested_compute=true&accelerator=gpu&region=us-east-1)，就可以部署模型，或者利用 `huggingface_hub` Python 库以编程方式创建和管理推理端点。\n",
    "\n",
    "在这里，我们将使用 Hub 库，通过指定端点名称和模型仓库，以及 `text-generation` 任务。在这个例子中，我们使用 `protected` 类型，因此访问部署的模型将需要一个有效的 Hugging Face token。我们还需要配置硬件要求，如供应商、地区、加速器、实例类型和大小。你可以使用[this API call](https://api.endpoints.huggingface.cloud/#get-/v2/provider)查看可用的资源选项列表，并在目录中[这里](https://ui.endpoints.huggingface.co/catalog)查看为选定模型推荐配置。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "running\n"
     ]
    }
   ],
   "source": [
    "from huggingface_hub import create_inference_endpoint\n",
    "\n",
    "endpoint = create_inference_endpoint(\n",
    "    \"nous-hermes-2-mixtral-8x7b-demo\",\n",
    "    repository=\"NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO\",\n",
    "    framework=\"pytorch\",\n",
    "    task=\"text-generation\",\n",
    "    accelerator=\"gpu\",\n",
    "    vendor=\"aws\",\n",
    "    region=\"us-east-1\",\n",
    "    type=\"protected\",\n",
    "    instance_type=\"p4de\",\n",
    "    instance_size=\"2xlarge\",\n",
    "    custom_image={\n",
    "        \"health_route\": \"/health\",\n",
    "        \"env\": {\n",
    "            \"MAX_INPUT_LENGTH\": \"4096\",\n",
    "            \"MAX_BATCH_PREFILL_TOKENS\": \"4096\",\n",
    "            \"MAX_TOTAL_TOKENS\": \"32000\",\n",
    "            \"MAX_BATCH_TOTAL_TOKENS\": \"1024000\",\n",
    "            \"MODEL_ID\": \"/repository\",\n",
    "        },\n",
    "        \"url\": \"ghcr.io/huggingface/text-generation-inference:sha-1734540\",  # must be >= 1.4.0\n",
    "    },\n",
    ")\n",
    "\n",
    "endpoint.wait()\n",
    "print(endpoint.status)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "部署启动需要几分钟时间。我们可以使用 `.wait()` 工具来阻塞运行线程，直到端点达到最终的“运行”状态。一旦运行，我们可以在 UI 播放器中确认其状态并试用：\n",
    "\n",
    "![IE UI Overview](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/messages-api/endpoint-overview.png)\n",
    "\n",
    "太好了，现在我们有一个可用的端点！\n",
    "\n",
    "_注意：使用 `huggingface_hub` 部署时，默认情况下，在15分钟空闲时间后，你的端点会自动缩放到零，以在非活动期间优化成本。查看[ Hub Python 库文档](https://huggingface.co/docs/huggingface_hub/guides/inference_endpoints)以了解可用于管理端点生命周期的所有功能。_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"section_2\"></a>\n",
    "\n",
    "## 2. 使用 OpenAI 客户端库查询推理端点\n",
    "\n",
    "如上所述，由于我们的模型托管在 TGI 上，现在支持消息 API，这意味着我们可以直接使用熟悉的 OpenAI 客户端库来查询它。\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 使用 Python 客户端\n",
    "\n",
    "下面的例子展示了如何使用[ OpenAI Python 库](https://github.com/openai/openai-python)进行这种转换。只需将 `<ENDPOINT_URL>` 替换为你的端点 URL（确保包含 `v1/` 后缀），并将 `<HF_API_KEY>` 字段填充为有效的 Hugging Face 用户 token。`<ENDPOINT_URL>` 可以从推理端点的 UI 中获取，或者从我们上面使用 `endpoint.url` 创建的端点对象中获取。\n",
    "\n",
    "然后我们可以像往常一样使用客户端，传递一个消息列表以从我们的推理端点流式传输响应。\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Open-source software is important due to a number of reasons, including:\n",
      "\n",
      "1. Collaboration: The collaborative nature of open-source software allows developers from around the world to work together, share their ideas and improve the code. This often results in faster progress and better software.\n",
      "\n",
      "2. Transparency: With open-source software, the code is publicly available, making it easy to see exactly how the software functions, and allowing users to determine if there are any security vulnerabilities.\n",
      "\n",
      "3. Customization: Being able to access the code also allows users to customize the software to better suit their needs. This makes open-source software incredibly versatile, as users can tweak it to suit their specific use case.\n",
      "\n",
      "4. Quality: Open-source software is often developed by large communities of dedicated developers, who work together to improve the software. This results in a higher level of quality than might be found in proprietary software.\n",
      "\n",
      "5. Cost: Open-source software is often provided free of charge, which makes it accessible to a wider range of users. This can be especially important for organizations with limited budgets for software.\n",
      "\n",
      "6. Shared Benefit: By sharing the code of open-source software, everyone can benefit from the hard work of the developers. This contributes to the overall advancement of technology, as users and developers work together to improve and build upon the software.\n",
      "\n",
      "In summary, open-source software provides a collaborative platform that leads to high-quality, customizable, and transparent software, all available at little or no cost, benefiting both individuals and the technology community as a whole.<|im_end|>"
     ]
    }
   ],
   "source": [
    "from openai import OpenAI\n",
    "\n",
    "BASE_URL = endpoint.url\n",
    "\n",
    "# init the client but point it to TGI\n",
    "client = OpenAI(\n",
    "    base_url=os.path.join(BASE_URL, \"v1/\"),\n",
    "    api_key=HF_API_KEY,\n",
    ")\n",
    "chat_completion = client.chat.completions.create(\n",
    "    model=\"tgi\",\n",
    "    messages=[\n",
    "        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
    "        {\"role\": \"user\", \"content\": \"Why is open-source software important?\"},\n",
    "    ],\n",
    "    stream=True,\n",
    "    max_tokens=500,\n",
    ")\n",
    "\n",
    "# iterate and print stream\n",
    "for message in chat_completion:\n",
    "    print(message.choices[0].delta.content, end=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在幕后，TGI 的消息 API 自动使用其[聊天模板](https://huggingface.co/docs/transformers/chat_templating)将消息列表转换为模型所需的指令格式。\n",
    "\n",
    "_注意：某些 OpenAI 功能，如函数调用，与 TGI 不兼容。目前，消息 API 支持以下 chat completion 参数：`stream`、`max_new_tokens`、`frequency_penalty`、`logprobs`、`seed`、`temperature` 和 `top_p`._\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 使用 JavaScript 客户端\n",
    "\n",
    "这里是与上面相同的流式示例，但是使用了[ OpenAI Javascript/Typescript 库](https://github.com/openai/openai-node)。\n",
    "\n",
    "\n",
    "```js\n",
    "import OpenAI from \"openai\";\n",
    "\n",
    "const openai = new OpenAI({\n",
    "  baseURL: \"<ENDPOINT_URL>\" + \"/v1/\", // replace with your endpoint url\n",
    "  apiKey: \"<HF_API_TOKEN>\", // replace with your token\n",
    "});\n",
    "\n",
    "async function main() {\n",
    "  const stream = await openai.chat.completions.create({\n",
    "    model: \"tgi\",\n",
    "    messages: [\n",
    "      { role: \"system\", content: \"You are a helpful assistant.\" },\n",
    "      { role: \"user\", content: \"Why is open-source software important?\" },\n",
    "    ],\n",
    "    stream: true,\n",
    "    max_tokens: 500,\n",
    "  });\n",
    "  for await (const chunk of stream) {\n",
    "    process.stdout.write(chunk.choices[0]?.delta?.content || \"\");\n",
    "  }\n",
    "}\n",
    "\n",
    "main();\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"section_3\"></a>\n",
    "\n",
    "## 3. 与 LangChain 和 LlamaIndex 集成\n",
    "\n",
    "现在，让我们看看如何将这个新创建的端点与像 LangChain 和 LlamaIndex 这样的流行 RAG 框架一起使用。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 如何与 LangChain 一起使用\n",
    "\n",
    "要在 [LangChain](https://python.langchain.com/docs/get_started/introduction) 中使用，只需创建一个 `ChatOpenAI` 的实例，并按如下方式传递你的 `<ENDPOINT_URL>` 和 `<HF_API_TOKEN>`：\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='Open-source software is important for several reasons:\\n\\n1. Transparency: Open-source software allows users to see the underlying code, making it easier to understand how the software works and identify any potential security vulnerabilities or bugs. This transparency fosters trust between users and developers.\\n\\n2. Collaboration: Open-source projects encourage collaboration among developers, allowing them to work together to improve the software, fix issues, and add new features. This collective effort can lead to')"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "llm = ChatOpenAI(\n",
    "    model_name=\"tgi\",\n",
    "    openai_api_key=HF_API_KEY,\n",
    "    openai_api_base=os.path.join(BASE_URL, \"v1/\"),\n",
    ")\n",
    "llm.invoke(\"Why is open-source software important?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们能够直接利用与 OpenAI 模型相同的 `ChatOpenAI` 类。这使得所有之前的代码只需更改一行代码，就能与我们的端点一起工作。\n",
    "\n",
    "现在，让我们在简单的 RAG 流水线中使用我们的 Mixtral 模型，来回答一个关于 HF 博客内容的问题。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'context': [Document(page_content='To overcome this weakness, amongst other approaches, one can integrate the LLM into a system where it can call tools: such a system is called an LLM agent.\\nIn this post, we explain the inner workings of ReAct agents, then show how to build them using the ChatHuggingFace class recently integrated in LangChain. Finally, we benchmark several open-source LLMs against GPT-3.5 and GPT-4.', metadata={'description': 'We’re on a journey to advance and democratize artificial intelligence through open source and open science.', 'language': 'No language found.', 'source': 'https://huggingface.co/blog/open-source-llms-as-agents', 'title': 'Open-source LLMs as LangChain Agents'}),\n",
       "  Document(page_content='Since the open-source models were not specifically fine-tuned for calling functions in the given output format, they are at a slight disadvantage compared to the OpenAI agents.\\nDespite this, some models perform really well! 💪\\nHere’s an example of Mixtral-8x7B answering the question: “Which city has a larger population, Guiyang or Tacheng?”\\nThought: To answer this question, I need to find the current populations of both Guiyang and Tacheng. I will use the search tool to find this information.\\nAction:\\n{', metadata={'description': 'We’re on a journey to advance and democratize artificial intelligence through open source and open science.', 'language': 'No language found.', 'source': 'https://huggingface.co/blog/open-source-llms-as-agents', 'title': 'Open-source LLMs as LangChain Agents'}),\n",
       "  Document(page_content='Agents Showdown: how do open-source LLMs perform as general purpose reasoning agents?\\n\\t\\n\\nYou can find the code for this benchmark here.\\n\\n\\n\\n\\n\\n\\t\\tEvaluation\\n\\t\\n\\nWe want to measure how open-source LLMs perform as general purpose reasoning agents. Thus we select questions requiring using logic and the use of basic tools: a calculator and access to internet search.\\nThe final dataset is a combination of samples from 3 other datasets:', metadata={'description': 'We’re on a journey to advance and democratize artificial intelligence through open source and open science.', 'language': 'No language found.', 'source': 'https://huggingface.co/blog/open-source-llms-as-agents', 'title': 'Open-source LLMs as LangChain Agents'}),\n",
       "  Document(page_content='Open-source LLMs as LangChain Agents\\n\\t\\n\\nPublished\\n\\t\\t\\t\\tJanuary 24, 2024\\nUpdate on GitHub\\n\\nm-ric\\nAymeric Roucher\\n\\n\\n\\n\\nJofthomas\\nJoffrey THOMAS\\n\\n\\n\\n\\nandrewrreed\\nAndrew Reed\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\t\\tTL;DR\\n\\t\\n\\nOpen-source LLMs have now reached a performance level that makes them suitable reasoning engines for powering agent workflows: Mixtral even surpasses GPT-3.5 on our benchmark, and its performance could easily be further enhanced with fine-tuning.\\n\\n\\n\\n\\n\\n\\t\\tIntroduction', metadata={'description': 'We’re on a journey to advance and democratize artificial intelligence through open source and open science.', 'language': 'No language found.', 'source': 'https://huggingface.co/blog/open-source-llms-as-agents', 'title': 'Open-source LLMs as LangChain Agents'})],\n",
       " 'question': 'According to this article which open-source model is the best for an agent behaviour?',\n",
       " 'answer': 'According to the article, Mixtral-8x7B is an open-source LLM that performs really well as a general-purpose reasoning agent. It even surpasses GPT-3.5 on the benchmark in the article.'}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain import hub\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from langchain_community.document_loaders import WebBaseLoader\n",
    "from langchain_community.vectorstores import Chroma\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.runnables import RunnablePassthrough\n",
    "from langchain_core.runnables import RunnableParallel\n",
    "from langchain_community.embeddings import HuggingFaceEmbeddings\n",
    "\n",
    "# Load, chunk and index the contents of the blog\n",
    "loader = WebBaseLoader(\n",
    "    web_paths=(\"https://huggingface.co/blog/open-source-llms-as-agents\",),\n",
    ")\n",
    "docs = loader.load()\n",
    "\n",
    "# declare an HF embedding model\n",
    "hf_embeddings = HuggingFaceEmbeddings(model_name=\"BAAI/bge-large-en-v1.5\")\n",
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=200)\n",
    "splits = text_splitter.split_documents(docs)\n",
    "vectorstore = Chroma.from_documents(documents=splits, embedding=hf_embeddings)\n",
    "\n",
    "# Retrieve and generate using the relevant snippets of the blog\n",
    "retriever = vectorstore.as_retriever()\n",
    "prompt = hub.pull(\"rlm/rag-prompt\")\n",
    "\n",
    "\n",
    "def format_docs(docs):\n",
    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
    "\n",
    "\n",
    "rag_chain_from_docs = (\n",
    "    RunnablePassthrough.assign(context=(lambda x: format_docs(x[\"context\"])))\n",
    "    | prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")\n",
    "\n",
    "rag_chain_with_source = RunnableParallel(\n",
    "    {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
    ").assign(answer=rag_chain_from_docs)\n",
    "\n",
    "rag_chain_with_source.invoke(\n",
    "    \"According to this article which open-source model is the best for an agent behaviour?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 如何与 LlamaIndex 一起使用\n",
    "\n",
    "类似地，你也可以在 [LlamaIndex](https://www.llamaindex.ai/) 中使用 TGI 端点。我们将使用 `OpenAILike` 类，并通过配置一些额外的参数（即 `is_local`、`is_function_calling_model`、`is_chat_model`、`context_window`）来实例化它。\n",
    "\n",
    "_注意：上下文窗口参数应与之前为端点的 `MAX_TOTAL_TOKENS` 设置的值相匹配。_\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CompletionResponse(text='Open-source software is important for several reasons:\\n\\n1. Transparency: Open-source software allows users to see the source code, which means they can understand how the software works and how it processes data. This transparency helps build trust in the software and its developers.\\n\\n2. Collaboration: Open-source software encourages collaboration among developers, who can contribute to the code, fix bugs, and add new features. This collaborative approach often leads to faster development and', additional_kwargs={}, raw={'id': '', 'choices': [Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content='Open-source software is important for several reasons:\\n\\n1. Transparency: Open-source software allows users to see the source code, which means they can understand how the software works and how it processes data. This transparency helps build trust in the software and its developers.\\n\\n2. Collaboration: Open-source software encourages collaboration among developers, who can contribute to the code, fix bugs, and add new features. This collaborative approach often leads to faster development and', role='assistant', function_call=None, tool_calls=None))], 'created': 1707342025, 'model': '/repository', 'object': 'text_completion', 'system_fingerprint': '1.4.0-sha-1734540', 'usage': CompletionUsage(completion_tokens=100, prompt_tokens=18, total_tokens=118)}, delta=None)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from llama_index.llms import OpenAILike\n",
    "\n",
    "llm = OpenAILike(\n",
    "    model=\"tgi\",\n",
    "    api_key=HF_API_KEY,\n",
    "    api_base=BASE_URL + \"/v1/\",\n",
    "    is_chat_model=True,\n",
    "    is_local=False,\n",
    "    is_function_calling_model=False,\n",
    "    context_window=4096,\n",
    ")\n",
    "\n",
    "llm.complete(\"Why is open-source software important?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在我们可以使用它在类似的 RAG 流水线中。请记住，之前在推理端点选择的 `MAX_INPUT_LENGTH` 将直接影响模型可以处理的检索到的数据块（`similarity_top_k`）的数量。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import (\n",
    "    ServiceContext,\n",
    "    VectorStoreIndex,\n",
    ")\n",
    "from llama_index import download_loader\n",
    "from llama_index.embeddings import HuggingFaceEmbedding\n",
    "from llama_index.query_engine import CitationQueryEngine\n",
    "\n",
    "\n",
    "SimpleWebPageReader = download_loader(\"SimpleWebPageReader\")\n",
    "\n",
    "documents = SimpleWebPageReader(html_to_text=True).load_data(\n",
    "    [\"https://huggingface.co/blog/open-source-llms-as-agents\"]\n",
    ")\n",
    "\n",
    "# Load embedding model\n",
    "embed_model = HuggingFaceEmbedding(model_name=\"BAAI/bge-large-en-v1.5\")\n",
    "\n",
    "# Pass LLM to pipeline\n",
    "service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=llm)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, service_context=service_context, show_progress=True\n",
    ")\n",
    "\n",
    "# Query the index\n",
    "query_engine = CitationQueryEngine.from_args(\n",
    "    index,\n",
    "    similarity_top_k=2,\n",
    ")\n",
    "response = query_engine.query(\n",
    "    \"According to this article which open-source model is the best for an agent behaviour?\"\n",
    ")\n",
    "\n",
    "response.response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 总结\n",
    "\n",
    "完成端点使用后，你可以暂停或删除它。这一步可以通过 UI 完成，或者像下面这样以编程方式完成。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# pause our running endpoint\n",
    "endpoint.pause()\n",
    "\n",
    "# optionally delete\n",
    "# endpoint.delete()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
